Sara Shirinkam (yzc280)

Final Project: Pfizer Sentiment Analysis and Visualization

What you want to do?

We do sentiment analysis on tweet data to check if people are willing to get the Pfizer vaccine or not.

Description of the dataset:

From Kaggle/ The data is collected using tweepy Python package to access Twitter API. Study the subjects of recent tweets about the vaccine made in collaboration by Pfizer and BioNTech, perform various NLP tasks on this data source.

Keywords: Sentiment analysis; COVID-19; Pfizer Vaccine

Introduction:

COVID-19 pandemic impacted the whole world, overwhelming healthcare systems - unprepared for such intense. Having more accurate information about the number of people who are willing to get the vaccine is important to produce enough amount of the vaccine. The tweets in this data set do not have associated sentiment labels. In this study, I use various visualization tools and exploratory data analysis, and time-based analysis to perform sentiment analysis and check if people are willing to get the Pfizer vaccine or not. Specifically, I analyze the tweets from December 2020 to April 2021, to study the change of the people interest in getting vaccinated over time based on the number of positive, negative, and neutral tweets. This can be difficult because COVID-19 is a new disease with many uncertainties and scientists are still working on the vaccine. As the proposed approach is data-oriented it is benefited from the actual behavior of data. It is also robust to human errors and potential biases. This research can provide patients and medical practitioners with accurate information about the vaccine availability and the schedule. So, COVID-19 vaccine appointments can be scheduled efficiently. And, It helps with better management and expensive and scares healthcare resources. The risk is that the data may not be generalizable. Though it provides an initial step toward a more accurate prediction of the amount of required vaccine as the payoff. In the next cell, we load the data and we see the first three rows.

We can see that some users are using as many as 151 keywords in their tweets. Let's look at some of these users. How about the overall distribution of the number of keywords used by users?

Looking at the distribution of numbers of keywords used by users in their tweets, vast majority of them are using between 120 to 150 keywords. Very few users are using over 150 or less than 120 keywords, and it is likely to drop such tweets due to possible uncertainity in determining their actual representative top-level category.

Preprocess the data:

In the bellow code, we preprocess the text feature of our dataset, which contains the tweet's body. We clean text data to avoid noise and misreadings.

VADER sentimental analysis:

In this step, I add three columns, positive, neutral, and negative sentiments to the data set. VADER sentimental analysis relies on a dictionary that maps lexical features to emotion intensities known as sentiment scores. The sentiment score of a text can be obtained by summing up each word's intensity in the text. For example, words such as 'good', ' 'happy,' 'like' all convey a positive sentiment. Also, VADER is intelligent enough to understand these words' basic context, such as "did not enjoy" as a negative statement. It also understands the emphasis of capitalization and punctuation, such as "LOVE."

Mapping sentiment into numeric and plotting:

To perform sentiment analysis, we add a column named "sentiment" to the data set to show if each tweet is positive, negative, or neutral. Based on the map below, there are more positive tweets (3355) than neutral (3075) and negative (1353) tweets. It seems that the number of people who are interested in having the vaccine is more than those who are not. In the meantime, the number of positive (3355) and neutral (3075) tweets is very close.

Sortting the dates

Visualization: Word cloud

This visualization shows the most common words among the most positive and negative tweets about the Pfizer vaccine from December 2020 to April 2021.

Visualizing the number of tweets per day:

According to the below graph, the number of tweets per day (the people's willing to post a tweet (positive, negative, and neutral) about the Pfizer vaccine is decreasing from December 6, 2020, to April 9, 2021. It could be because it is already about a month (from March 2021) that the vaccine production has increased and was available for all ages and categories of people. The mean value is about 60 tweets. For the rest of this report, we analyze and visualize the pattern of the positive, negative, and neutral tweets per day to check whether people are willing to get the vaccine overtime or not.

Visualizing the number of positive tweets per day:

According to the below graph number of positive tweets is decreasing from December 2020 to April 2021. It looks like that the Pfizer vaccine is not a hot topic as it was back in December 2020 and January 2021. The mean value is about 30 tweets

The most number of positive tweets (109) posted on January 8, 2021. What happened on January 8th that has this increase?

Pfizer Inc (PFE.N) announced that Pfizer COVID-19 vaccine can protect against a key mutation in the highly transmissible new variants of the coronavirus discovered in Britain and South Africa, according to a laboratory study conducted by the U.S. drugmaker. The news page can found by running the below cell.

Visualzing the number of negative tweets per day:

According to the below graph the number of negative tweets is decreasign from December 2020 to April 2021. It looks like that the Pfizer vaccine is not a hot topic as it was back in december 2020 and January 2021. The mean value is about 11 tweets. The maximum negative tweets (51) was posted on January 9th. It happens in the same week as the maximum positive tweets was posted. In general the number of negative tweets is less than the number of possitive twees. Therefore, we can conclude that people are more willing to get vaccinated.

Visualzing the number of neutral tweets per day:

According to the below graph the number of neutral tweets is decreasign from December 2020 to April 2021. It looks like that the pattern of the neutral and positive tweets are very similar. Therefore, we can conclude that the number of people are willing to get the vaccin are very close to those that are nuetral about it.

Conclusion:

In this project, I have studied sentiment analysis over time for tweets from Pfizer vaccine. I have analyzed the pattern of the positive, negative and neutral tweets from December 2020 to April 2021. Using visualization tools, the number of negative tweets (with a mean value of 11 tweets) is less than the number of possitive tweets (with a mean value of 30 tweets). Therefore, we can conclude that people are more willing to get vaccinated over time.